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ABSTRACT 

The area investigated in the present study is the 
comparison of the permutation t-test with Student's t--test and the 
Mann-Whitney U-test, The comparison was made for small samples for 
three distributions, including a normal distribution, a uniform 
distribution, and a skewed distribution. The properties of each test 
compared were the probability of a Type I error and the power against 
a location-shift alternative hypothesis. The present research 
indicates that the permutation t-test is an acceptable statistical 
procedure for the two-sample problem for the normal and uniform 
populations and suggests that it might be more desirable than the 
traditional Student's t^test when sample sizes are proportional to 
the means and the parent population is nonnormal and asymmetric. 
Further research is needed before a more definite statement can be 
made about the permutation t-test when sampling from nonnormal 
populations. (Author) 
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The Wisconsin Research and Development Center for Cognitive Learning 
focuses on contributing to a better understanding of cognitive learning by 
children and youth and to the improvement of related educational practices. 
The str^Jtegy for research and development is comprehensive. It includes basic 
research to generate new knowledge about the conditions and processes of learning 
and about the processes of instruction, and the subsequent development of 
research-based instructional materials, many of vhich are designed for use 
by teachers and others for use by students. These materials are tested and 
refined in school settings. Throughout these operations behavioral scientists, 
curriculum experts, academic scholars, and school people interact, insuring 
that the results of Center activities are based soundly on knowledge of 
subject matter and cognitive learning and that they are applied to the ira-* 
provement of educational practice, 

T'his Technical Report is from the Quality Verification Program, whose 
principal function is to identify and invent research and development stra- 
tegies taking into account current knowledge in the field of statistics, 
psychometrics and computer technology. The Quality Verification Program 
collaborates in applying such strategies in research and development. The 
translation of theory into practice and presentations of exemplars of 
methodology are challenges which the Quality Verification Program strives 
to meet. 
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Abstract 



The area investigated in the present study is the comparison of the 
permutation t-test with Student's t-test and the Mann-Whitney U-test, The 
comparison w^s made for small saii5)les for three distributions including a 
normal distribution, a uniform distribution and a skewed distribution. 
The properties of each test compared were the probability of a Type I 
error and the power against a location-shift alternative hypothesis* 

The present research indicates that the permutation t-^test is an 
acceptable statistical procedure for the two-sample problem for the normal 
and uniform populations and suggests that it might be more desirable than 
the traditional Student's t-test when sample sizes are proportional to the 
means and the parent population is nonnormal and asymmetric. Further re- 
search is needed before a more definite statement can be made about the 
permutation t-test when sampling from nonnormal populations. 
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INTRODUCTION 

A frequently encountered design in educational and psychological re- 
search is that which compares some characteristic of two populations. 
The comparison is usually made by drawing a sample from each of rv/o popu- 
lations, obtaining a measure of some characteristic of each and testing 
some function of the measures. If the experimenter desires to test the 
hypothesis that the population means are equal, then a test statistic 
commonly used is Student's t-test for two independent samples (Student, 
1908). Student^s t-test is the stati stical procedure chosen most often 
for the twoT,ample problem because of a general property of statistical 
tests: power. The power of a statistical test is the probability of re- 
jecting the null hypothesis given that some alternative hypothesis of in- 
terest is true- Another general property affecting the choice of a sta- 
tistical procedure is the probability of rejecting the null hypothesis 
falsely, usually known as the probability af a Type I error. The level of 
the probability of a Type I error is chosen by the experimenter before 
the experiment takes place. If both populations are normally distributed with 
equal variances and the alternative hypothesis of interest is that the 
populations differ only in location, then Student's t-test has the highest 
power of the available statistical procedures for this situation. Under 
these conditions, the probability of a Type I error will be exactly the 
level set by the experimenter. 
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Thus, if an experimenter is sampling from normal populations with 
equal variances, and testing a hypothesis of equal population mears against 
a location-shift alternative. Student's t-test is the best statistical 
test on the basis of power. However, if the populations from vhich the 
samples are drawn are not normal, or do not have equal variances, the 
experimenter might be led to choose a statistical procedure other than 
Student^s t-test. The experimenter would specify the probability of a 
Type I error and would want to choose the statistical procedure having 
the highest power for his erperimental situation. 

A general class of statistical procedures which do not assume normality 
and which might have high power and an exact probability of a Type I error 
for non-normal populations are those called distribution- free tests. These 
tests are not entirely distribution- free because they assume a continuous 
distribution, although it need not be normal. Two distribution- free tests 
for the two- sample case are the Mann-Whitney U-test (Mann ^ Whitney, 1947) 
and the permutation t-test. The permutation t-test is based upon a distri- 
butlon obtained by calculating the t-statistlc for each permutation of the 
data. The Mann-Whitney U-test Is based upon the ranks of the observations, 
ratner than on the observations themselves. It is of interest to the 
educational or psychological researcher to know the power of the permutation 
t-test and the power of the Mann-Whitney U-test against a location-shift 
alternative for the population with which he Is working. Knowing the power 
and probability of a Type I error of the permutation t-test, the Mann- 
Whitney U-test and Student^s t-test for various populations will allow the 
experimenter to choose one of the three statistical procedures. 
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For a normal popul;i*ion it is of interest to know how much power would 
be lost if the permutation t-test or the Mann-Whitney U-test were used 
instead of Student's t-test. For a non-normal population, it is of in- 
terest to know if the pawer of the permutation t-test or the Mann-Whitney 
U-test is larger than the power of Student's t-test. Thus, the populations 
from which the experimenter could sample might be distributed as the normal, 
uniform (non-normal but symmetric) and skewed (non-normal and asymir.etr ic) 
distributions. Knowing the power and probability of a Type I error for 
the Mann-Whitney U-test, the permutation t-test and Student's t-test for 
these populations would allow the experimenter to choose one of these three 
statistical procedures. The present research compares Student's t-test, 
the Mann-Whitney U-test and the permutation t-test on the probability of a 
Type I error and the power against a location- shift alternative for the 
normal, uniform and skewed populations. 

The following review of the literature includes a discussion of hypo- 
thesis testing in the two- sample case and a detailed discussion of Student's 
t-test, the Mann-Whitney U-test and the permutation t-test. 

Review of the Literature 

The two-sample problem is frequently encountered in applied research. 
Several hypotheses may be made for this design, depending upon the charac- 
teristic of the population which the experimenter desires to test. If one 
desires to test differences between means, the null hypothesis to be tested 
is that the population means are equal . However, If one desires to test 
merely that the populations are different, then the null hypothesis to be 



ERLC 



3 



tested is that the two independent samples were drawn from the same popu- 
lations with the same distribution. In the present research, the popula- 
tions from which the samples were drawn have been specified so the null 
hypotheses of equal means and equal populations may be considered to be 
equivalent. The extent to which this equivalence holds is dependent upon 
the alternative under consideration. The alternative used in the present 
research was that the populations differed only in location. Thus, the mean 
of one population was of value y and the other population, shifted in 
location by an amount 6, with 0 > 0, had a mean of y+6 . Thus, only one- 
tailed tests are considered. 

Many statistical procedures have been proposed to test hypotheses 
of equivalent distributions or hypotheses of equal means. Festinger (1946), 
Fisher (1925), Kolmogorov (1941), Mann and Whitney (1947), Mood (1950), 
Pearson (1911), Pitman (1937a), Smirnov (1948), Wald and Wolfowitz (1940), 
and Wilcoxon (1945) have all given statistical procedures to test the 
hypothesis of equivalent distributions. Student (1908) presented a statistic 
whose sampling distribution can be used to test the hypothesis that the 
means. of two normal populations with equal variances are equal. 

The statistical procedures included in the present research may be 
classified on several dimensions. The most obvious classification scheme 
is by the hypothesis to be tested, which may be classified by terms often 
used erroneously-parametric and non-parametric. The error which is most 
often made is that of confusion of the two terms non- parametric (describing 
the problem) and distribution-free (describing the statistical method used 



to solve the problem while making no assumptions about the form of the distri- 
bution from which the sample was drawn). Both parametric and non- paramo trie 
problems may be solved by statistical methods which may or may not be distri- 
bution-free. The Mann-Whitney U-test is used to test the hypthesis of equi- 
valent populations (non- parametric problem) and is a d is tr ibut ion- f ree 
statistical procedure. The permutation t-test (or Pitman test) is used 
to test the hypothesis of equality of means (a parametric problem) and 
is a distribution- free technique. Student's t-test is used to test the 
hypothesis of equality of means (a parametric problem) and is not distri- 
bution-free. Most distribution- free methods were developed for non- parametric 
problems and in common usage "non-parametric" is often substituted for 
"distribution- free . " 

Another relevant dimension of classification is the assumptions neces- 
sary to use the test. One rule accompanying this dimension is that a 
parametric test in general is more pwerful (i.e. , sensitive to change in 
the factor being tested) than an equivalent non- parametric test it_ the 
assumptions for both tests are met . The assumptions may be concerned with 
the distribution from which the sample was drawn, the independence of the 
observations or the scale of measurement. It was mentioned above that 
Student's t-test is parametric. The assumptions for the t-test are: 
independence of observations, normally distributed errors, equality of 
variances, and measurement on at least an interval scale. The meaning- 
fulness of the results of the t-test depends upon meeting these assump- 
tions. If a researcher knows that certain of these assumptions cannot be 
met in his experimental situation, the t-test may not be the appropriate 
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statistic to be used because another statistic may have higher power than 
Student's t-test. Most d is tr ibution- f ree tests assume independence of 
observations and an underlying continuous distribution , but do not make 
assumptions about the distribution from which the sample was drawn. Para- 
metric tests are generally more powerful than their distribution- free 
counterparts if their assumptions are met. However, it is logical to 
question what happens to the statistical test if in fact the assumptions 
are not met. 

The invariance of the probability of a Type I error (a) when the 
assumptions underlying the test have not been met is known as the robustness 
of the test (Box and Andersen, 1955). Since parametric tests are most powerful 
under normal theory assumptions, there is a strong temptation to use these 
tests when the normality of the distribution is in question. Thus, there 
has been considerable study of the robustness of parametric tests (Box, 1954a, 
1954b, Box and Andersen, L955) and, correspondingly, there has been consider- 
able study on the power of non- parametric tests. First, the robustness of 
Student's t-test will be considered and literature pertaining to research 
done on Student's t-test will be presented. Literature pertaining to the 
power of the permutation t-test and the Mann-Whitney U-test will follow. 

Student's t-test 

Student's t-test is used to test the hypothesis of equal population 
means for the two-sample problem if the populations are normal and have 
equal variances. Student's t-test is most powerful against a location- shift 
alternative hypothesis. The test is performed by calculating the two- 
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independent- sample t- s tat is tic 



X - 



Y 




j:(Xi - X )2 + UY^ ^ Y )2 



m + n - 2 



m n 




where X. is the mean of a sample of size m of X^^'s and Y. is the mean of 



a sample of size n of Y^'s, and determining the probability of obtaining 
a t-statistic larger than or equal to the original t-statistic by using 
the tabled t-distributlon with m + n - 2 degrees of freedom. If the proba- 
bility is less than or equal to the probability of a Type I error (usually 
denoted by Ct) set by the experimenter , the null hypothes is is rejected. 
Alternatively, the experimenter may check to see if the calculated t- 
statistic is greater than or equal to the tabled t-value for the probability 
of a Type I error and m + n - 2 degrees of freedom. Tables of t are in 
most elementary statistics texts (see Hays, 1965). 

Most research relevant to the robustness of Student's t-test has 
been done on the one-way analysis of variance, which is the k-sample exten- 
sion of the two independent sample t-test as introduced by Student (1908). 
Thus, the analysis of variance! research applies to Student's t-test. 

Box (1954a) has shown that the one-way analysis of variance, and there- 
fore the two-satnple t-test, is robust to violation of the assumption of 
variance homogeneity if sample sizes are equal. If the sample sizes are 
unequal, and the variances are also unequal, then the test will have a 
probability of a Type I error which is smaller than a if the larger sample 
is from the population which has the larger variance. If the smaller sample 
came from the population with. the larger variance, the test has the proba- 
bility of a Type I error which is larger than ex. 
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Considering the assumption of a normal population from which the samples 
were drawn, Kendall and Stuart (1967, p. 466) point out that the independence 
of the numerator and denominator of the t holds only for normal parent popu- 
lations. If the samples are drawn from a non-normal parent population, 
the numerator and denominator of the t are not necessarily independent 
and the dependence affects the probability of a Type I error. However, 
for large sample size, if the parent population is symmetric or if the 
samples are of equal size, the t-test is robust to non- normality. Thus the 
probability of a Type I error is relatively unaffected. Gayen (1941, 1950) 
found these same results. Srivastava (1958) found that the effect of non- 
normality On the probability of a Type I error and power of the t-test was 
not marked if the skewness and kurtosis \<?ere small. Little is said of the 
effect of non-normality of the parent population if the sample size is small 
for either equal or unequal samples, other than that the t-test should be 
relatively robust. When sampling from a normal distribution with small 
samples, the power of the t-test may be calculated exactly (see Milton, 
1966). In summary. Student's t-test is relatively robust to violation 
of assumptions If certain conditions are met. However, in practice it is 
often difficult to decide if the use of the t-test is likely to be valid 
or misleading. To aid in deciding on the use of the t-test, preliminary 
tests have been suggested. The idea of using preliminary tests to determine 
if the assumptions have been met has been soundly denounced as poor practice 
(Box and Andersen, 1955) due to the fact that the preliminary test itself 
then comes under question as to its power with respect to certain factors. 



Thus, ve would be led to start a long chain of tests each designed to test 
assumptions for the preceding one. Box and Andorsen instead call for tests 
which are robust and ablr ^ < f ^ ' without piuiiuiiaary checks on 

their assumptions. 

An alternative to tests which are robust to violations of their distri- 
bution assumptions is the derivation of distribution-free statistical pro« 
cedure3 which can provide answers to the questions of interest. Such statis- 
tical procedures do not assume the observations to be distributed normally, 
but merely assume that tne distribution is continuous. The permutation 
t-test Is such a statistical procedure. 

Permutation t-test 

The permutation t-test is used to test the hypothesis of equal popu- 
lation means for the two-sample problem if the populations are continuous. 
The populations do not need to be normally distributed. The permutation 
t-test is performed by completing the following sequence of events: obtain 
all possible arrangements (permutations) of the observed data, compute the 
two independent sample t-statistic for each permutation, arrange the t- 
statistic in a distribution and determine the probability of obtaining 
a t-statistic larger than or equal to the original observed t-statistic 
in this distribution. If the probability is less than or equal to the pro- 
bability of a Type I error (usually denoted by a) set by the experimenter, 
the null hypothesis is rejected. Alternatively, the experimenter may check 
to see if the original t-statistic from the observed data is greater than or 
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v.^uai Liie t-statistic which cuts off a-percent of the distribution in 
the upper tail. 

Many of the permutations obtained in the abow^ procediaare yield the 
same statistic. Since it is easier to obtain alL ja^^^slble combinations <of 
m+Ti divided into m and n, and both procedures yi^aM itlie same probabilitfes 
for the statistic (see Appendix A), the peirmutatlm r-test may be based 
on the distribution of the t-statistic calculated fcrr every possible 
coinbination of the observed data. However, the ^wfeanHe procedure depends 
on the experimenter choosing a prob^abllity of a T^F^fte I error (a) ^hich 



divides 




= (m+n) !/(mln! ) evenly. 



Permutation tests are based on the fact that: aia^ permutation of the 
observations has an equal chance of occurrence im the distxibutiom of the 
test statistic. The theoretical bassis of the peanaatatlon t-test is pre- 
sented in Scheffe, 1943, pp. 307--3O8. Simply staa^aed the basis is as 
follows: the desired property for a statistical ^aacocedure which does not 
assume normality of the population is that the staitdstical procedure must 
always yield a region of rejection which has the same probability under 
the null hypothesis for every possible distributSoa q£ measures of interest- 
Permutation tests guarantee this property becaus^e ithe distribution obtained 
is based on the. data, not on the population, and the probability of the 
rejection region is always a. 

Before the literature on permutation tests C3» \ka evaluated, the power 
of permutation tests must be considered. The poViesr &i permutation tests 
may be generally thought of in two ways: first, as «thsa: will be called 
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an "unconditional power," and second, as a power conditional upon the ob- 
servations. Th^^ conditional power of permutation tests was not used in the 
present research and is included in the present discussion merely for 
comparative purposes. There are two types of conditional power of permu- 
tation tests: the fixed cut-off point power and a more general power given 
by Kempthorne (1952). The power used in research by Baker and Collier 

(1966) » Collier and Baker (1966)^ Kempthorne et^ al , (1961) » and Toothaker 

(1967) was the conditional power known as the fixed cut-off point power. 
In the £ixed cut-'Off point procedure the observations are permuted, a • 
specified treatment effect (constant) is added to each observation after 
the permutation, and the statistic is computed for each peirrautation. The 
proportion of permutations with the statistic falling above the fixed 
cut-off point, usually defined from normal theory for purposes of comparison 
with normal theory tests, is the conditional power. The fixed cut-off 
point power is dependent upon the observations. No sampling is done and 
generalizations may not be made beyond the given set of observations. 

Also, the fixed cut-off point power is a theoretical power for use primarily in 
research on the power of permutation tests and is usually not obtained in 
practice. Another conditional power of permutation tests s^^ilar to the 
fixed cut-off point power is that operationally defined by Kempthorne 
(1952, p. 219). In the Kempthorne procedure the observations are permuted, 
a specified treatment effect is added to each observation after the per- 
mutation, and the statistic is computed for each permutation. Then for 
each permutation the statistic is tested via the permutation test: a 



permutation c^istribution of the statistic for observations plus treatment is 

obtained, the original statistic is compared to this distribution and either 

an acceptance or a rejection is made. The proportion of the original 

pemtutationa for which a rejection is made is the power. The conditional 

power given by Kempthome is also a theoretical power for use in research 

on the power of the permutation tests and is not obtained in practice 

due to the extensive calculations required. 

The power of the permutation test which will be called ''unconditional 

power** in the present research is based upon random sampling. The rejection 

region of the permutation test is conditional upon the obseirvations for each 

sample, but the power is the proportion of rejections over repeated sampling 

from some population when the null hypothesis is false. The seemingly 

illegitimate marriage of a test which was designed to be used on a set of 

given observations with traditional sampling may be justified as follows: 

the experimenter usually wants to generalize beyond the set of observations 

in hand to some population of interest. If the experimenter is going to 

use the permutation test, and wants to generalize in the usual way to the 

population from which he has sampled, it is of interest to know the power 

of the permutation test for repeated sampling from that population. Box 

and Andersen (1955) point out the difference between unconditional power and 

conditional power of the permutation test: 

Two alternative views of the nature of the inference 
" in the permutation test can be taken. These differ 
in the conception of the population of samples from 
which the observed sample is supposed to have been 
drawn. On the first view our attention is confined only 
to that finite population of samples produced by 
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rearrangement of observations of the experiment. We 
prefer to adopt the second view which is that the samples 
are regarded as being drawn from some hypothetical 
infinite population in the usual way. 

Thus, while the conditional power results from a population dependent upon 

the observations, the unconditional power is based on random sampling from 

some population. The obvious advantage of unconditional power is the 

capability to go beyond the observed data to a population of the statistic 

based on samples of the given size. The type of power of permutation tests 

used in the present research is the unconditional power. Thus, the power 

against the location-shift alternative of the permutation t*-test as found 

in the present research applies to any sample of a given size from a given 

distribution. 

Permutation tests are difficult to perform due to the formidable labor 
involved in calculating the staizistic for all possible permutations, so 
this procedure was not considered practical until the advent of electronic 
computers. Because of the lengthy calculations, normal theory tests are 
used as an approximation for , permutation tests even though the rationale 
for the two types of tests is quite different. The reason the approximation 
was first suggested was that moment calculations and empirical studies 
demonstrated the two types of tests to be similar under certain conditions. 
Most of the literature on permutation tests is on the analysis of variance 
F-test, and very little is on the permutation t-test. However? results 
for the one-way analysis of variance are generally applicable to the per-- 
mutation t~test. Fisher (1935) first introduced the permutation or randomi- 
zation test as the exact tes . for testing for differences between means of 
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two populations when aosumptions were not met. Fisher pointed out that 
the probability of a Type I error for the permutation t-test closely approxi- 
mated the normal theory probability of a Type I error for the particular 
problem with which he dealt. Pitman (1937a) was next to consider permutation 
tests. For the two sample problems, Pitman introduced a test statistic, 
w, which is a monotonic increasing function of t^, 

1 

^ z where N=*m-Hi, the combined 

. . N-2 sample size. (2) 

t2 

Pitman (1937b) and Welch (1937) both derived basic results on the 
permutation test for the analysis of variance for the randomized block and 
Latin square designs. Both derivations for the analysis of variance held 
for large sample size and were based on a comparison of moments of the test 
statistic under normal theory and under permutation. For the randomized 
block design, Pitman (1937b) and Welch (1937) showed that the F-test may 
underestimate the significance level if block variances were not equal. 
However, if the number of blocks is large the underestimation is not serious. 
Wald and Wolfowitz (1944) derived a general theorem on the limiting distribu- 
tion of linear forms in the universe of permutations of the observations. 
They showed that the distribution of the test statistic for the randomized 
block design is asjmiptotically the F-distribution underlying normal theory 
analysis of variance. For PitmanVs test, and thus for the permutation t-test, 
Wald and Wolfowitz showed that the distribution of the test statistic, w, is 
asymptotically normal. Hoeffding (1952) found that permutation tests for 
the randomized block design and for the two sample problems are asympto- 
tically as powerful as their related parametric tests. Thus the permutation 
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test for the randomized block design is asymptotically as powerful as the 
normal theory F-test^ and the permutation t-test is asymptotically as powerful 
as Student's t-test. Scheffe (1959, Chapter 9) summarized these and other 
results on pemiutation tests. 

Considerable research has been done on the F-test under permutation in 
the analysis of variance for various designs, most of it empirical (see 
Baker and Collier, 1966a; Box and Andersen, 1955; Collier and Baker > 1963; 
Collier and Baker, 1966; Kempthorne, 1952; Kempthorne et^ al • 1961; and 
Toothaker, 1967). The existing research shows that if the assumptions 
are met and if sample s±z^ is not small, the probability of a Type I error 
and power of the permutation F-test is approximately the same as the pcv7er 
of the normal theory F-test; if the assumptions are not met and if sample 
size is not small, the probability of a Type I error and power of the F-test 
under permutation Is still fairly close to that of the normal theory F-test, 
if the violation is not severe. 

The study by Box and Andersen (1955) yielded an Important result in 
the study of permutation tests. Box and Andersen introduced a correction 
for the normal theory F-test. Wlien the degrees of freedom are multiplied 
by the correction factor, the F-test with the corrected degrees of freedom 
is an approximate permutation nest. The correction factor corrects for the 
non-normality and heterogeneity of variance of the design. Extensions of 
this correction procedure have been devised for multivariate situations by 
Gelsser and Greenhouse (1958). The correction factor of Box and Andersen 
was used in an empirical study by Toothaker (1967) to investigate the 
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joint effect of variance heterogeneity and block treatment interaction on 
the. F-test under permutation in the randomized block design. 

As has been pointed out several times above, most of the research 
on permutation tests is for large sample size; The present research deals 
with the comparison of the permutation t^test with Student's t^test and the 
Mann-Whitney U-test for the normal, uniform and skewed populations for small 
sample sizes. 

The existing literature on the comparison between the permutation t-test 
and Student^s t-test involves the comparison of the power of the two tests. 
Since Student's t-test is the most powerful test under normal theory, the 
power of the distribution-free method, the permutation t-test, can be 
compared to the power of the t-test 'to measure the loss in power when sampling 
from a normal distribution. Several measures to compare the power of two 
tests are available. 

One measure to compare the power of two tests is the relative effi- 
ciency. The relative efficiency of two tests is defined to be the ratio 
of the sample sizes necessary to attain the same power against the same 
alternative, where the sample size in the numerator is that of the most 
powerful test. Siegel (1956) multiplies the relative efficiency by one 
hundred and calls it the power efficiency, a more descriptive term. The 
most conmionly used measure is the asymptotic relative efficiency (ARE), 
defined as the limiting relative efficiency of two tests against a sequence 
of local alternative hypotheses as the sample size increases. The permutation 
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t-test has an ARE of 1 when compared to Student's t--test for an alter- 
native of location shift • 

A disadvantage of the permutation t-test is that its exact distribution 
is tedious to enumerate by hand except for very small sample sizes. Also, 
the distribution of the permutation t-test will be different for every 
set of actual observations, which are random variables, making it impossible 
to tabulate the exact permutation distribution of the permutation t-test. 
With the advent of electronic computers, this disadvantage has become less 
serious. However, it is still desirable to be able to tabulate the distri- 
bution of the statistic for various sample sizes. Rank tests satisfy the 
desire to be able to tabulate the distribution of the statistic for various 
sample sizes. The Mann-Whitney U-test is a rank test for the two- sample 
problem. 

Hann-Whitney U-test 

One way to remove the variability of the distribution of the test 
statistic from one set of observations to another is to replace each ob- 
servation, X^, with some value, Z^, for which the permutation distribution 
of the statistic is the saime for every sample of the same size. If these 
values are chosen to maintain the order relations between two of the values, 
Xj^ and the ranks of the observations are not the obvious choices. A 
furtlier desirable aspect of the ranka is that they are invariant under any 
monotonic transformation of the variable. Therefore, we consider some 
function of the ranks of the observations. We define the rank of the i*"^ 
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observation to be its position in the set of the ordered observations, with 

the smallest receiving the lowest rank. The ordering of the observations, 

X_^, is one of the N! possible permutations, and the ordering of the ranks, 

Z^, is a permutation of the integers one to N. The function of the ranks 

which has theoretically desire±>le properties (see Kendall and Stuart, 1967) 

n 

is the sum of the ranks of one of the samples R= Z Z., where the ranking is 

i=l ^ 

done over the total sample. The Mann-Whitney U-statistic is based on 
such a function. The Wilcoxon and Festinger tests are functions of the 
U-statistic and thus may be considered equivalent tests. 

Rank tests such as the U-statistic are permutation tests. Although 
few authors point out the fact, many of the rank tests when calculated in 
their small sample or exact form are penmitation tests on the ranks of 
the observations (see Kruskal and Wallis, 1952, and Wilks, 1962). The rank 
permutation test exists for not only the two independent sample case but 
for the two related sample, k independent sample, and k related sample cases. 
Rank permutation tests also exist for hypotheses of independence (see 
Hotelling and Pabst, 1936; Kendall and Si-uart, 1967; Pitman, 1937a; and 
Wald and Wolfowitz, 1943). Although only the two independent sample case 
is considered in this research, future research is planned for the re- 
maining cases. 

Specifically, a rank-permutation test exists for the Mann-Whitney 
U-test. The U-test could be completed for any set of observations by per- 
forming a permutation test on the ranks of the observations. The probabi- 
lities for possible values of U for a given sample size can be calculated 
by performing i.11 possible permutations of the ranks for one sample of size 
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nCXj^ where ia the i observation) and the other sample of size 

^(Yj^ where is the i^^ observation) and tabulating the proportion 

of times a given U value appears. 

The Mann-Whitney U"-statistic can be defined for a given n and m as the 

number of times Y rankings exceed X rankings or 

n m (+1 if Y X 

U - Z S h^ where h - ^ ^ ^ (3) 

i-1 j==l ( 0 otherwise 

The calculating formula as given by Mann and Whitney shows the relation 
between the U statistic and the sum of the ranks: 

m(nH-l) 

U = nm + — r - R (4) 

2 m 

or 

n(u+l) 

U = nm + ~ (5) 

where is the sum of the ranks of the n observations in the first sample 
and R^ is the sum of the ranks of the m observations in the second sample. 
The smaller of (4) and (5) is the tabled value, and the null hypothesis is 
rejected in favor of the location-shift alternative if values as large 
or larger than the tabled value are found. Tables now exist for the U-test 
for probabilities .0005, .005, .0025, .001, .01, .025, .05, and .10 for 
m <_ 40 and n £ 20 (mllton, 1964) , and the need for exact calculations via 
permutation does not exist for small samples. For larger sample sizes, 
the normal approximation is ordinarily used where: 



19 



ECU) = 



and 

VAR(U) = ""^"1^^^ 



12 



The use of the U-statistic is covered in many elementary statistical tests 
(see Hays, 1965, and Siegel, 1956) and appears to have heavy usage in all 
areas of research (see Savage, 1962). 

The literature on the Mann-Whitney U-test also includes power comparisons 
involving the ARE, as was discussed above when comparing the permutation t- 
test to Student's t-test. As mentioned above, Student's t-test is the 
most powerful test in the •:;wo-sampl£! case if normal theory conditions arft 
met. Therefore, the power of the U-test is necessarily less than that of 
the t-test for normal theory assumptions. Hodges and Lehmann (1956) have 
shown that the ARE of the U-test as compared to Student's t-test for a 
normal distribution is .95 and may never be less than .864 when the locatiaji- 
shift alternative is considered. Hodges and Lehmann also report that the 
ARE is equal to unity for the uniform distribution. Wetherill (1960) re- 
ports that for a gamma distribution with one degree of freedom the ABE 
is three and for an Edgeworth population with skewness measure n - .67 
the ARE is unity. So, for non-normal distributions the asymptotic com- 
parison of the power of the U- and t-test shows that the power of the 
U can be considerably better than that of the t, especially if the dis- 
tribution is not symmetric. 
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Small sample power functions for the Mann-Whitney U-test have been 
derived for several distrib^itions and contputations done for at least a 
few sample sizes. Most of the literature deals with the small sample power 
of the U-test for the normal distribution and the location-shift alterna- 
tive. Milton (1966) computed extensive tables of the power of various 
non-parametric tests against the shift alternative for the normal distri- 
bution and offered a direct comparison with the power of Student's t-test. 
The table of the power of the U-test covers all possible sample size combi- 
nations of m^n from 2,1 up to 7,7 for various values of 6 . Dixon (1954) 
and van der Vaart (1950) also have dealt with the power of the U-test and 
the normal shift alternative. Milton, Dixon and van der Vaart all show 
that the small sample power of Student's t-tesc is close to that of the Mann- 
Whitney U-te^3t for the normal shift alternative. Gibbons (1964), Haynam 
and Govindarajulu (1966), and Lehmann (1953) have all dealt with the power 
of the U-test for distributions other than the normal and/or alternative 
other than location shift. Glazer (1964), Pratt (1964); and van der Vaart 
(1961) investigated the effect of differences in population variances on 
the probability of a Type I error of the Mann-Whitney U-test and Student's 
t-test» The probability of a Type I error of the U-test was less affected 
by variance differences than the t^test if sample sizes were unequal, but 
the t-test fared better than the U-'test If m«n. Glazer (1964) reported 
that the small sample power of the t-test was larger than the power of the 
U-test if m«n or if there were no variance differences. Thus, the U-test 
is relatively robust to variance differences if m^n, when compared to the 
t-test. 
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Considerable research has been cited above on the power and probability 
of a Type I error of Student's t^test, the Mann-Whitney U^test, and the 
permutation t-test. Most of this research has been asymptotic, with excep- 
tions being small sample probability of a Type I error and power of the 
test for selected distributions and altemativp.s, and the small sample 
probability of a Type I error and power of the t^test for normal distri- 
butions with the shift alternative. There has been essentially no system- 
atic research done on the small sample probability of a Type I error and 
the power of the permutation t-test for any distribution. The present 
research investigates empirically the small sample probability of a Type 
I error and the power of the permutation t-test for normal, uniform and 
skewed distributions with a location-shift alternative. The probability 
of a Type I error and power of the Mann-Whitney U-test and Student's t-test 
are also calculated empirically for comparison purposes and as a check on 
calculations. 

After a general restatement of the problem. Chapter II covers the 
definition of the power as used in the present study, the procedures for 
obtaining the power in the computer program used and definitions of the 
populations. 
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II 



NATURE AND STRUCTURE OF THE PROBLEM 

The area investigated in the present study is the comparison of the 
permutation f-test with Student's t-test and the Mann-Whitney U-test. The 
comparison was made for small samples for three distributions including a 
normal distribution, a uniform distribution and a skewed distribution. 
The properties of each test compared were the probability of a Type I 
error and the power against a location-shift alternative hypothesis. 

Power is generally defined as the probability of a rejection if the 
altemativii is true. More specifically, if X is an observed sample point, 03 
is the critical region of the test, represents the null hypothesis of 
equal population means, and represents the location^-shif t alternative, 
then 

p(Xea)|H^) = a 
and p(Xea)|H^) 1-3== power 
where a is the probability of a Type I error and g is the probability of 
a Type II error. The choice of O) , the critical region, and X, the sample 
point, depends on the test under consideration. For specific definitions 
of the power of Student's t-test, the Mann-Whitney U-test and the permu- 
tation t-test, the sample point and the critical region must be given in the 
definition for each. The power of the three statistical procedures in the 
present study is the unconditional power which is based on random sampling 
from some population. However, it should be pointed out that the rejection 
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regions for rhe Mann-Whitney U-test and Student's t~test are not conditional 
on the data for any population as is the rejection resion for the permuta- 
tion t-test. 

For Student's t-test. a normal theory test, w is -hosen as the top 
lOOa-percent of the theoretical t distribution. The observed sainple point, 
X, is the two- independent sample t-statistic given in (1), above. The 
test is given by rejecting if t is contained in the rejection region, 
<^ , otherwise failing to reject H^. TTie power is then the proportion 
of rejections over an infinite number of samples and tests of H^, when the 
location-shift alternative is true. 

For the Mann-Whitney U-test, a permutation test on the ranks of the 
observations, (o is chosen as the top lOOa-perceat of the distribution of 
U obtained by calculating U for each permutation of the ranks of the ob- 
servations. The observed sample point, X, is the U-atatistic for the 

observed data, and the test is given by rejecting H if the U from the 

o 

observed data is contained in the rejection region, a,, otn^iTwise failing 
to reject H^. The power is the proportion of rejections over an infinite 
number of samples and tests of H^, when the location-shift alternative 
is true. 

For the permutation t-test, a permutation test on the observations, 
W is chosen as the top lOOa-percent of the distribution of t obtained 
by calculating t for each permutation of the observations. The observed 
sample point, X, is the t-statistic given by formula (1), and the test is 



given by rejecting if the t frota the observed data is contained in the 
rejection region> , otherwise failing to reject. Then, the power is the 
proportion of rejections over an infinite number of samples and tests of 
H^, when the location-shitr ciltemative is true. 

From the above definitions of the unconditional power of Student's 
t-test, the Mami-Wiitney U-test and the permutation t-test, procedures 
were developed for obtaining estimates of the power and were implemented 
in the computer program used in the present research. For all three 
statistical procedures the sampling part of the power procedure was identical 
and the statistics were all computed on the same observations. A random 
sample of size n was drawn from a population with mean y and a second 
random sample of size m was drawn from a population with mean ]s^Q . Both 
populations were identical except for the "location parameter. For Student's 
t-test, the t-statistic was computed and the null hypothesis of equal means 
was rejected if the value of the t-statistic was larger than the tabled 
lOOa-percent value from the t-distribution with m+n-2 degrees of freedom. 
The sampling and computation was done 1000 times and the proportion of 
rejections yielded an estimate of the power. 

For the Mann-Whitney U-test, the same observations as were used for 
the t-test were ranked and the U--statistic computed on the ranks of one of 
the samples. The ranks were then permuted and the U-statistic computed 
for every possible permutation. The original U-statistic was then compared 
to the distribution of U-values obtained from the permutations and if the 
original U-statistic was in the lOOot-percent rejection region the null 
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hypothesis of equal laeans was rejected- The saii5)ling and computation 
was done 1000 times and the proportion of rejections yielded an estimate 
of the power. 

For the pemutation t-test, the t-statlstic was computed for the 
original observations. The observations were then permuted and the t- 
statistic computed for every possible permutation. The original t^statistic 
was then compared to the distribution of t^values obtained from the permu- 
tations and if the original t-statistic was in the lOOa-percent rejection 
region the null hypothesis of equal means was rejected. The sampling 
and computation was done 1000 times and the proportion of rejections 
yielded an estimate of the power. 

When 9 was equal to zero* the proportion of rejections obtained in the 
three procedures outlined above was an estimate of the probability of a 
Type I error for the statistical procedure. 

The empirical power and probability of a Type I error for the permu- 
tation t^teet, Student's t-test and the Mann-Whitney U-test were obtained for 
normal, uniform and skewed populations. The three distributions of interest 
were obtained by use of rand« number generators and a digital computer. 
To ob<:ain results for the normal population, random samples of size m and 
n were drawn from the unit normal distJributlon N(0,1), by use of a random 
number generator, RANSS (see UWCC User's Manual), and the Control Data 3600 
computer. RANSS generates random standard normal deviates by a method 
which uses pseudo-random odd integers distributed uniformly in the interval 
(0,2*^). The uniformly distributed numbers are generated by. a power-residue 
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method (Hull and Dobell, 1962). The procedure uses a starting integer 
s^alue, Xq, specified by the ttser, an integer, a^s", and another integer. 
i»-2 , called the modulus. A sequence, X^. of non-negative integers is 
then defined by the congruence relationship: 

^i= 5^^Xj^_^(iBod 2^^), or in general 



^1 = aX^_j(inod to) 



(6) 



The n^thod described above is called a power residue method of generating 
random numbers. The power residue method meets all statistical require- 
ments, i.e., independence of successive values, and numbers distributed as 
desired as determined by a chi-square test, and it also meets the require- 
menta of a long series of numbers without repetition (see Hull and Dobell. 
1962, and IBM, 1959). The power residue method is considered to be satis- 
factory if it is used correctly (IBM, 1959). A series of nurf^ers produced 
by a pseudb-random number generator will eventually repeat. Proper use of 
the power residue method involves choosing the starting value, X^, the multi- 
plicative constant, a, and the modulus, m. so that they have qualities which 
yield a long series, X^. The following limitations, when placed upon the 
parameters of the congruence jfelation (6) , will yield the longest series 
of numbers, which will also have good properties statistically: 



a) choose m-2^ 



b) Xq must be odd and relatively prime to 2^* 

c) a must be of the form a*8c+3, or a+3=8c or ca(a+3)/8 
must be an integer. 
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If the above limitations are placed on the parameters of the congruence 
relation (6), the generator will produce 2^"^ terms before repeating. The 
RANSS generator has m-2*3^ ^,513 ^^^^ requlreoe. ts be- 

cause c-(5l3+3)/8 an integer. The choice of Xq odd and relatively 
prime to 2^3 y^^i^ ^ series which has 2^1 pseudo-random numbers before 

repeating. Thus, on the order of eight billion numbers may be produced 
before repeating, which is deemed adequate for the present study. 

The random normal generator, RANSS, then uses the values to form 

a normally distributed random variable. If X is the i*^^ variable and 

n S -n 

^n" ^ \* ^* - """^ is distributed normally with mean=0 and variance 

i-i a vir 

« 1, N(0,1), as n approaches infinity due to the Central Limit Theorem (see 
Mood and Grayblll, 1963). 

With n>16, the approximation of Y to N(0,1) is adequate. Thus, n 
is taken equal to sixteen, the multiplication and reduction (mod 2^^) is 
repeated sixteen times and the variable Y is returned as the pseudo-random 
variable distributed N(0,1). 

The analysis for the rectangular population was begun by drawing random 
samples of size m and n from the unit uniform distribution by use of the 
random number generator EANF (CDC, 1966) . RANF generates random numbers in 
the interval (0,1) by utilizing a power residue method simdlar to that 
described above. The parameters of the congruence relation (6) are as 
follows: m"2^^ and The parameters of EANF meet the requirements 

above if the starting value is an odd integer and relatively prime to 2^^. 
A sequence of non-negative integers is defined by: 
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*1 = 5^\^^^(i»od 2^^^) (7) 

vhich are uniformly distributed in the interval (0,2^^). To obtain floating 
point numbers distributed in the interval (0,1), the value of Y^=(X^+l)/2'*^ 
is calculated and returned to the user. 

The pseudo-random uniformly distributed numbers returned by RANF were 
then scaled so that the variance of the population would be unity, the same 
as the variance of the normal population. The variance of a uniform distri- 
bution is given as 

2 ^j[a=bl! 

12 (8) 

where a and b are the limits of the distribution. 
2 2 

To obtain a "1, (a-b) must equal twelve and a-b must equal the square 
root of twelve. RANF returns values distributed uniformly in the interval 
(0,1). If each value returned is multiplied by \^=3.46, then the value 
returned will be distributed uniformly in the interval (0,3.46) and the 
variance will be approximately one. 

The skewed population was derived from a chi-square distribution with 
three degrees of freedom. The first three moments of the chi-square distri- 
bution are v, 2v, and 8v, where v is the degrees of freedom (Kendall and 
Stuart, 1967, p. 370). The skewnees measure 




yi'\n (9) 

is then approximately 1.633 for the chi-square with three degrees of free- 
dom. The distribution is unimodal with a positive skew and mean and variance 
of three and six, respectively. 
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since a chi-square varlate with N degrees of freedom is defined as 

^(N) ' 1^1 V— j = ±h Zi (10) 

where is distributed N(0,1), the sum of squares of N unit normal variables 
Is distributed as chi-square with N degrees of freedom. A chi-square 
variable with three degrees of freedom was generated by calling the unit 
normal random number generator, RANSS, three times, squaring each unit 
normal variable, and summing. 

The pseudo-random chi-square distributed numbers were then scaled so 
that the variance of the skewed population would be unity, the same as the 
variance of the normal population. The variance of a chi-square distri- 
bution with three degrees of freedom is six, so each chi-square value 
was multiplied by OlJ*^, yielding a skewed population with mean equal to 
3/VS", and variance equal to one. The skewness measure is still equal to 
1.633. 

The above generation techniques yielded variates distributed as a 
normal distribution, a uniform distribution and a skewed distribution, 
respectively. 

To obtain results for the probability of a Type I error for the above 
distributions the values of the probability of a Type I error were chosen 
for sample sizes such that a-k/ , where k is chosen such that a Is 

close to .05 and a < .05 if possible. By choosing theoretical values of 
the probability of a Type I error in this manner, the empirical probability 
of a Type I error will vary greatly with sample sizes, but will be much 
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more accurate for a given pair of sample sizes than had the probability of a 
Type I error been chosen such that a< .05 for all saTi5)le sizes. Also, 
certain values of the sample sizes, such as samples of sizes two and three, 
could not possibly yield values of theoretical probability of a Type 1 
error less than .05 and would have had to have been left out of the study. 
Such , choice of the theoretical probability of a Type I error also made the 
specification of the power considerably easier, since the exact probability 
of a Type I error and thus the exact critical value could be obtained. 

To obtain the results for the power for the three statistical pro-- 
cedures, 6 > 0 was defined such that the levels of power of Student's t- 
test would be .30, .60, and .90 for the normal distribution. The defined 9 
was used for all three statistical procedures and for all three distri- 
but ions. 

Specification of 6 for the normal distribution was made through 

2 

the definition of the non-centrality parameter, 6 , for the non-central 
t-distribution as given by Scheffe (1959, p. Al), 

a^tS"^ » ij^ B ^ (11) 

where ^ is the column vector of contrasts on the cell means, and y^* 
and B « ""^ t where f is the variance of the desired contrast. Since 
the t-test deals with the difference between means, the contrast desired 
is y^^ - y2, 

so . a (yi-y2) 
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then 



-1 



and 



2 

or a2fi2 ^ (■\^i-\i2\ 



(m n) 



(12) 



.2 



Setting CT =1, and solving for ^■^-•]l2 yields 

Y m n 

Several FORTRAN subprograms were used to obtain the values of 9 , 
which were utilized In the main program. First, the exact t-value was 
obtained for the exact probability of a Type I error for given sample sizes 
through use of a subprogram written to compute exact probabilities for the 
F-dlstrlbutlon (see Baker and Collier, 1966b). The obtained t-value and 
the desired probability of a Type II error (1-desired power value) were 
used in another, subprogram written by Milton (see UWCC User's Manual under 
"New Subprograms") to yield the appropriate non-centrality parameter, , 
for those sample sizes. Given 6, m and n, the value of 9 was computed. 
The power results could be obtained by drawing one of the samples from a 
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distribution with mean y + 9 and the other from a distribution with mean y. 
One method for achieving the desired result would be to alter the random 
number generators. However, it was not necessary to alter the random 
number generators to sample from populations with means y + 9 for the 
following reason; if a constant 6 is added to every score in a distribu- 
tion with mean y, the mean of the new distribution is simply y + 9. Thus, 
by sampling from a distribution with mean y and adding the. defined 9 to 
each value obtained, the result is the same as if sampling had been done 
from a distribution with mean y + 9. For the normal population, 9 + y^ 
because y2"0. The samples drawn for the power results were as if they 
had been drawn from the normal distributions N(0,1) and n(9,l), from the 
rectangular distribution f2(x) and fi(x+9) and from the skewed distribution 
f2(x) and fi(x+9), where 9 is defined as in (13) above. The values of 
9 for all sample size^ considered in the present study are given in 
Appendix 

The sample sizes considered in the present research are the nine 
arrangements of (2,3), (2,4), (2,5), (3,3), (3,4), (3,5), (4,4), (4,5) and 
(5,5). These sample sizes were part of a larger set originally chosen 
because of existing exact probabilities of the Mann-Whitney U-statistic 
in table form. Consideration of computing time and programming difficulty 
then narrowed the range of sample sizes to the above set. 

The empirical small sample power and size for the permutation t-test. 
Student's t-test and the Mann-Whitney U-test were obtained by means of a 
computer program written for this purpose by the author. The program 
MONTEl was written in FORTRAN and was run on the Control Data Corporation 
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3600 computer. For a given sample size the program is designed to draw 
samples from the appropriate population, add the specified 6 (null or non- 
null) to the data, complete the permutation procedure for the U-test and the 
permutation t-test, complete the normal theory test for Student's t-test 
by use of the appropriate value from the t-dlstribution. The program is 
designed to then repeat the entire procedure 1000 times. The number of 
samples to be dravm was determined strictly by consideration of the computing 
time involved. The number 1000 was the largest possible nuniber of samples 
which could be analyzed without using an inordinate amount of computer 
time. After the 1000 samples have been drawn, the program is designed 
to then print out the estimated probability of a Type I error and power 
of each of the three tests for the given sample size. In addition, it 
was thought advisable to check for influence of the size of the sample 
to which the 9 was added, so two sets of power values are printed, one 
set for e being added to the larger of the two samples and one set for 9 
being added to the smaller of the two samples. 
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Itl 

RESULTS 

Probability of a Type 1 Error 

The empirical values of tfie probability of a Type I error of Student's 
t-test, the Mann-4rtiitney U-test, and the permutation t-test for various 
small sample s5.zes from the previously specified normal, uniform, and 
skewed populations are/given in Table 1 as evidence verifying the Monte 
Carlo procedures. The theoretical probability of a Type I error is given 
as a. Only two empirical values of the probability of a Type I error in 
Table 1 ware larger than that expected from sampling variability. For 
sample size (.4,4) from the skewed pppulation, the values of .044 and .044 
for the ManR-Whitney U-test and the permutation t-test were more than 2a 

P 

larger than .0286, the theoretical a. For equal sample sizes from the 
skewed population, there was a trend of empirical values of the probability 
of a Type I error for the Mann-Whitney U-test and the permutation t-test 
which were larger than both the theoretical a and the value for Student's 
t-'test. Also, for unequal sample sizes from the skewed population, the 
values for the Mann-Wliitney U-teat and the permutation t-teat followed the 
opposite trend; that is, they were less than the theoretical a in four 
of the six cases and leas than the value for Student's t-^teat in five of 
thje six cases. 

The remaining values of the empirical probability of a Type I error 
were within the bounds of sampling variation, and there were no other 



35 



TABLE 1 



The Empirical Probability of a Type I Error for Three Two-Sample Statistic 
for Three Pareat Populations, and for Various Sample Sizes (the values in 
the table are the proportion of rejections in 1,000 random samples) 



and 
O Values 
. P, . . , 


a 


btatxstical 
Test 


Normal 


Uniform 


Skewed 


C2,3) 

a - .0095 
P 


.10 


Student's t 
Mann --Whitney U 
Permutation t 


.105 
.109 
.109 


.109 
.1G5 
.105 


.104 
.094 
.094 


a " .0069 
P 


Q5 


ocUucIIl S C 

Mann--Whitney U 
permutation t 


/Mr c 

. 055 
.056 
.056 


.056 
.051 
.051 


.046 
.053 
.053 


(2,4) 

a - .0079 
P 


.0667 


Student's t 
Mann-4Ihitney U 
permutation t 


.079 
.074 
.074 


.072 
.065 
.065 


.073 
.059 
.059 


(3,4) 

a - .0053 
P 


.0286 


Student's t 
Mann-Whitney U 
Permutation t 


.034 
.031 
.031 


.037 
.027 
.027 


.033 
.028 
.028 


(4,4) 

a » .0053 
p 


.0286 


Student's t 
Mann-Whitney U 
Permutation t 


.029 
.029 
.029 


.025 
.021 
.021 


.039 

.044^ 

.044^ 


(2,5) 

a = .0067 
p 


.0476 


Student's t 
Mann-Whitney U 
Permutation t 


.046 
.049 
.049 


.043 
.050 
.050 


.055 
.043 
.043 


C3,5) 

a « .0059 

p 


.0357 


Student's t 
Mann-Whitney U 
Permutation t 


.038 
.039 
.039 


.034 
.036 
.036 


.047 
.041 
.041 


(A,i) 

a » ,QQ55 
P 


. 0317 


Student ^s t 
Mann-^ihltney U 
Permutation t 


.026 
.026 
.024 


.037 
.033 
.032 


.035 , 

.040 

.039 


(5,5) 

a - ,0067 
P 

a 


.0476 


Student 'at 
M«in-4Jhitney U 
Permutation t 


.055 
.055 
.055 


.054 
.050 
.052 


.045 - 

.050 

.050 



a - - 

The observed empirical probability is more than 20 fr 

P 



ERIC 



ERIC 



consistent trends evident in the empirical values. The equality of the 
empirical values in Table 1 for the Mann-Whitney U-test and the permu- 
tation t-test for all sample sizes other than (4,5) and (5,5) is due to 
the fact that when the number of combinations is small, the rejection 
region for both tests contains a very small number of points. Thus, only 
a few combinations of the data result in a rejection with the permu- 
tation t-test, and the same exact combinations are the ones which yield 
rank sums large enough to cause a rejection with the U-test. When the 
sample size gets larger, such as (4,5) and (5,5), there are more points 
in the rejection region, therefore the chance of a combination of the 
data to reject on one test and not on the other. 

Power 

The values of the empirical power of Student's t-test, the Mann- 
Whitney U-test, and the permutation t-test for various small sample 
sizes from the previously specified normal, uniform, and skewed popu- 
lations are presented in Table 2. As was the case with the probability 
of a Type I error, the values of empirical power for the permutation t-test 
and the Mann-Whitney U-test are identical within each population for 
sample sizes smaller than (4,5). 

For the normal and uniform populations, the power of Student's 
t-test was generally larger than the power of the permutation t-test 
for both the "small" and "large sample addition procedure" and for all 
sample sizes. Of the 108 cases available (three levels of Q, nine sample 
sizes for the large and small sample addition procedures for each of two 
populations) there were 102 cases where the power of Student's t-test 
was larger than that of the permutation t-test and 37 which were larger 
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thdn expected by chance with large differences occurring for the uniform 
distribution. Of the six cases where the power of the permutation t-test 
was larger than that of Student's t-test, two occurred for sample size 
(5,5). Also, for sample sizes (4,5) and (5,5) the power values of the 
permutation t-test were generally closer to those of Student's t-test 
than was true for the smaller sample sizes. For sample sizes (4,5) 
and (5,5) the values of empirical power of the permutation t-test were 
usually larger than these of the Mann-Whitney U-test. 

For the "large sample addition procedure" when sampling from the 
skewed population, the empirical power values for the permutation t-test 
were greater than those of Student's t-test for fifteen of the twenty- 
seven cases available (three levels of 9, nine sample sizes). The 
seven differences which were larger than expected from sampling variation 
were for unequal sample sizes with either the small or mediuiii levels of 
power. For example, for sample size (2,3), with small 0, the values 
.354 for the permutation t-test and .309 for Student's t-test are more 
thaa 20^ apart and thus are most likely due to something other than 
sampling variation. Other large differences occurred for sample size 
(3,5) with small and medium 6. For samples of equal size < (3,3), 
(4,4), (5,5) < or near equal size < (3,4), (4,5) < the differences between 
the power values for Student's t-test and the permutation t-test were 
small • 

For the "small sample addition procedure" when sampling from the 
skewed population, or when the smaller sample came from the skewed popu- 
lation with the larger mean (y + 9) , the empirical power values for the 

^ AT 



permutation t-test were greater than or equal to those for Student's 
t-test for only six of the twenty-seven comparisons available. In fact 
thirteen of the twenty-seven comparisons showed a larger-than-sampllng- 
varlatlon difference with Student's t-test having the larger power value. 
The differences In favor of Student's t-test were the largest for unequal 
sample sizes, and only for equal sample sizes (3,3) and (5,5) did the 
power values of the permutation t-test approach or exceed those of 
Student's t-test. 
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IV 

SUMMARY AND CONCLUSIONS 



The results presented above for the permutation t^test show that the 
empirical probability of a Type I error for repeated sampling from a 
normal or uniform population was generally close to the theoretical a. 
The empirical probability of a Type I error for repeated sampling from 
the specified skewed population was generally close to the theoretical a 
but showed one sample size which had inexplicably divergent results 
for the permutation t-test and the Mann-Whitney U-test. This discrepancy 
was part of a trend of other discrepancies which were within the bounds 
of sampling variation. The empirical results for the power showed that 
the permutation t-test generally had smaller power than Student's t-test 
for the uniform and normal populations. For the skewed population the 
permutation t-test generally had higher power values than Student's 
t-test if the larger sample were drawn from the population with the larger 
mean (U + 8) . if the samples were of equal size, the permutation t-^test 
generally had power values which were close to those of Student's t-test 
but did not exceed them. However, if the smaller sample were drawn 
from the skewed population with the larger mean (]i + 9), then the power 
values of Student's t-test were larger than those of the pesrmutation 
t-tcst. For all three populations, the power of the permutation t-test 
approached that of Student's t-test as sample size increased, even 
for samples as small as (4,5) and (5,5). The increase in power was more 
rapid for the permutation t-test than for the Mann-Whitney U-test, and the 
power of the permutation t-test was always greater than or equal to that 
of the Mann-Whitney U-test. 
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Using Student's t as the test statistic for the permutation test 
for the two-sample problems gives a statistical procedure which not 
only has ARE of one for the normal population but has very- close agreement 
with Student's t-test for small samples. The agreement is indicated by 
the closeness of values of empirical power and probability of a Type I 
error for the permutation t-test when compared to those of Student's 
t-test for the normal population. Although similar results show that 
the permutation t-test is in close agreement with Student's t-test 
for the uniform population, the empirical power of the permutation t- 
test for the skewed population showed that the permutation t-test could 
have higher power than Student's t-test if the sample sizes were propor- 
tional to the population means when the parent population has the spe- 
cific skewed distribution with = 1.633 and ^ A. The present study 
also gives further support to the knowledge that Student's t-test is 
generally robust to the violation of the normality assumption, even for 
very small samples. 

The present research indicates that the permutation t-test is an 
acceptable statistical procedure for the two-sample problem for the 
normal and uniform populations and suggests that it might be more desirable 
than the traditional Student's t-test when sample sizes are proportional 
to the means and the parent population is nonnormal and asymmetric. 
Further research is needed before a more definite statement can be made 
about the permutation t-test when sampling from nonnormal populations. 
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Appendix A 

An Example on Perautatlone and Combinations 



For example, consider m»n-2 and 



Permutations 



V 
A 


V 
X 




12 


34 


3 


12 


43 


3 


21 


34 


3 


21 


43 


3 


13 


24 


4 


13 


42 


4 


31 


24 


4 


31 


42 


4 


14 


23 


5 


14 


32 


5 


41 


23 


5 


41 


32 


5 


23 


14 


5 


23 


41 


5 


32 


14 


5 


32 


41 


5 


24 


13 


6 


24 


31 


6 


42 


13 


6 


42 


31 


6 


34 


12 


7 


34 


21 


7 


43 


12 


7 


43 


21 


7 



as the statistic: 



Combinations 



X 


Y 


SX 


12 


34 


3 


13 


24 


4 


14 


23 


5 


23 


14 


5 


24 


13 


6 


34 


12 


7 



For Both Permutations and 
Combinations 



zx 


' P( IX) 


3 


1/6 


4 


1/6 


5 


2/6 


6 


1/6 


7 


1/6 
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Appendix B 
Values of 9 for Various Sample Sizes 



SsoplA 

sices .. 


Small 
6 


Medium 

e 


Large 

e 


(2,3) 


.7254 


1.6327 


2,7726 


(3,3) 


1.0888 


1.8783 


2.9390 


(2,4) 


.9631 


1.7790 


2.8554 


(3,4) 


1.2758 


2.0195 


3.0243 


(4,4) 


1.1418 


1.8011 


2.6854 


(2,5) 


1.1026 


1.8765 


2.9084 


(3,5) 


1.0745 


1.7427 


2.6354 


(4,5) 


1.0145 


1.6175 


2.4214 


(5,5) 


.7871 


1.3336 


2.0546 
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